Open Domain Question Answering


PARSE: An Open-Domain Reasoning Question Answering Benchmark for Persian

Add code
Feb 01, 2026
Viaarxiv icon

Toward Cognitive Supersensing in Multimodal Large Language Model

Add code
Feb 02, 2026
Viaarxiv icon

MedAraBench: Large-Scale Arabic Medical Question Answering Dataset and Benchmark

Add code
Feb 02, 2026
Viaarxiv icon

Evolving from Tool User to Creator via Training-Free Experience Reuse in Multimodal Reasoning

Add code
Feb 02, 2026
Viaarxiv icon

Benchmarking Uncertainty Calibration in Large Language Model Long-Form Question Answering

Add code
Jan 30, 2026
Viaarxiv icon

MiNER: A Two-Stage Pipeline for Metadata Extraction from Municipal Meeting Minutes

Add code
Jan 30, 2026
Viaarxiv icon

Automated Benchmark Generation from Domain Guidelines Informed by Bloom's Taxonomy

Add code
Jan 28, 2026
Viaarxiv icon

TSAQA: Time Series Analysis Question And Answering Benchmark

Add code
Jan 30, 2026
Viaarxiv icon

SONIC-O1: A Real-World Benchmark for Evaluating Multimodal Large Language Models on Audio-Video Understanding

Add code
Jan 29, 2026
Viaarxiv icon

Reinforcement Learning from Meta-Evaluation: Aligning Language Models Without Ground-Truth Labels

Add code
Jan 29, 2026
Viaarxiv icon